Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves detection of PyPI package names in environment dependencies #1699

Merged
merged 4 commits into from
Aug 21, 2024

Conversation

andrewnester
Copy link
Contributor

Changes

Improves detection of PyPi package names in environment dependencies

Tests

Added unit tests

@pietern pietern changed the title Improves detection of PyPi package names in environment dependencies Improves detection of PyPI package names in environment dependencies Aug 20, 2024
@andrewnester andrewnester enabled auto-merge August 20, 2024 13:27
bundle/libraries/local_path.go Outdated Show resolved Hide resolved
// ^[a-zA-Z0-9\-_]+: Matches the package name, allowing alphanumeric characters, dashes (-), and underscores (_).
// \[.*\])?: Optionally matches any extras specified in square brackets, e.g., [security].
// ((==|!=|<=|>=|~=|>|<)\d+(\.\d+){0,2}(\.\*)?)?: Optionally matches version specifiers, supporting various operators (==, !=, etc.) followed by a version number (e.g., 2.25.1).
// Spec for package name and version specifier: https://pip.pypa.io/en/stable/reference/requirement-specifiers/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including the spec.

bundle/libraries/local_path.go Outdated Show resolved Hide resolved
@andrewnester andrewnester requested a review from pietern August 21, 2024 08:15
@andrewnester andrewnester added this pull request to the merge queue Aug 21, 2024
Merged via the queue into main with commit c775d25 Aug 21, 2024
5 checks passed
@andrewnester andrewnester deleted the fix/env-deps-compare-ops branch August 21, 2024 08:30
andrewnester added a commit that referenced this pull request Aug 21, 2024
CLI:
 * Added filtering flags for cluster list commands ([#1703](#1703)).

Bundles:
 * Remove reference to "dbt" in the default-sql template ([#1696](#1696)).
 * Pause continuous pipelines when 'mode: development' is used ([#1590](#1590)).
 * Add configurable presets for name prefixes, tags, etc. ([#1490](#1490)).
 * Report all empty resources present in error diagnostic ([#1685](#1685)).
 * Improves detection of PyPI package names in environment dependencies ([#1699](#1699)).
 * [DAB] Add support for requirements libraries in Job Tasks ([#1543](#1543)).
 * Add paths field to bundle sync configuration ([#1694](#1694)).

Internal:
 * Add `import` option for PyDABs ([#1693](#1693)).
 * Make fileset take optional list of paths to list ([#1684](#1684)).
 * Pass through paths argument to libs/sync ([#1689](#1689)).
 * Correctly mark package names with versions as remote libraries ([#1697](#1697)).
 * Share test initializer in common helper function ([#1695](#1695)).
 * Make `pydabs/venv_path` optional ([#1687](#1687)).
 * Use API mocks for duplicate path errors in workspace files extensions client ([#1690](#1690)).
 * Fix prefix preset used for UC schemas ([#1704](#1704)).
github-merge-queue bot pushed a commit that referenced this pull request Aug 22, 2024
CLI:
* Added filtering flags for cluster list commands
([#1703](#1703)).

Bundles:
* Remove reference to "dbt" in the default-sql template
([#1696](#1696)).
* Pause continuous pipelines when 'mode: development' is used
([#1590](#1590)).
* Add configurable presets for name prefixes, tags, etc.
([#1490](#1490)).
* Report all empty resources present in error diagnostic
([#1685](#1685)).
* Improves detection of PyPI package names in environment dependencies
([#1699](#1699)).
* [DAB] Add support for requirements libraries in Job Tasks
([#1543](#1543)).
* Add paths field to bundle sync configuration
([#1694](#1694)).

Internal:
* Add `import` option for PyDABs
([#1693](#1693)).
* Make fileset take optional list of paths to list
([#1684](#1684)).
* Pass through paths argument to libs/sync
([#1689](#1689)).
* Correctly mark package names with versions as remote libraries
([#1697](#1697)).
* Share test initializer in common helper function
([#1695](#1695)).
* Make `pydabs/venv_path` optional
([#1687](#1687)).
* Use API mocks for duplicate path errors in workspace files extensions
client ([#1690](#1690)).
* Fix prefix preset used for UC schemas
([#1704](#1704)).
Copy link

@GeroSalas GeroSalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change recently introduced is causing all my cicd jobs to fail

@andrewnester
Copy link
Contributor Author

@GeroSalas could you please describe what's the failure is and what error do you get?

@GeroSalas
Copy link

@andrewnester sure!

Basically I defined an env var ${var.signol_lib_package_version} with the value signol_lib-0.4.4-20240822+prod-py3-none-any.whl

so I reference it dinamically in the YAML of the job tasks where I require it, like below:

libraries:            
    - whl: ${workspace.root_path}/files/dist/${var.signol_lib_package_version}

pyproject.toml

signol-lib = {path = "dist/signol_lib-0.4.4-20240822+prod-py3-none-any.whl"}

All the setup was working fine. Job tasks were executing fine and the compute was installing the right libs, but suddenly it stopped working, so when I went into the cicd logs now I see these 2 new lines appeared in my AWS codebuild logs:

Building signol_lib...
Uploading signol_lib-0.4.4-py3-none-any.whl...

And definietly yes, the signol_lib package was overwritten again because this recent change introduced picked up but referenced the wrong whl full name and reupload it incorrectly so my tasks cannot find the correct one anymore.

@andrewnester
Copy link
Contributor Author

@GeroSalas Just to confirm few things:

  1. did it work on version 0.226.0?
  2. Do you have an artifacts section in bundle configuration defined for this library?

@GeroSalas
Copy link

@andrewnester

  1. Yes, it was working fine in v0.226.0
  2. No

@andrewnester
Copy link
Contributor Author

@GeroSalas can you please share the full bundle YAML configuration? I can't seems to reproduce the error so far, so something else might be missing.
Also, do you have setup.py file in your bundle root directory?

@mfleuren
Copy link

mfleuren commented Aug 23, 2024

@andrewnester

I'm experiencing a similar issue as @GeroSalas where all our integration tests cannot be validated,deployed nor run. Everything was working fine in v0.226.0 but since updating to v0.227.0 it stopped working and gives a Error: Python wheel tasks require compute with DBR 13.3+ to include local libraries. Please change your cluster configuration or use the experimental 'python_wheel_wrapper' setting. See https://docs.databricks.com/dev-tools/bundles/python-wheel.html for more information. for me. Rolling back to 0.226.0 for now, as that still works. (Thus, error handling also seems incomplete/incorrect as the DBR has nothing to do with this)

We do have a setup.py in the bundle's root directory for another purpose, but are running the DAB using a direct link with GIT. The branch the DAB is based on, is passed on using an variable.

Relevant code snippets:

Databricks.yml

bundle:
    name: wise

include:
    - resources/*.yml

variables:
  integration_branch:
    description: The source branch of the PR for the integration test
    default: development

targets:    
    integration:
      mode: production
      git:
          branch: ${var.integration_branch}
      workspa<host>
          root_path: /Shared/${bundle.name}/${bundle.target}
      run_as:
          user_name: <user>

azure-pipelines.yml

resources:
    containers:
        - container: pycontainer
          image: databricksruntime/standard:10.4-LTS

steps:
    - script: |

          cd ./wise_bundle/wise

          export DATABRICKS_HOST=$(URL)
          export DATABRICKS_TOKEN=$(PAT)

          echo $(System.PullRequest.TargetBranch)

          if [ "$(System.PullRequest.TargetBranch)" == "refs/heads/development" ]
          then
            echo "Deploying bundles to the integration environment"

            # Get the source branch
            branch=$(System.PullRequest.SourceBranch)
            echo $branch
            branch_name=${branch#refs/heads/}
            echo $branch_name
            
            # Deploy integration workflow using the source branch
            databricks bundle validate -t integration
            databricks bundle deploy -t integration --force-lock --var="integration_branch=$branch_name"
            databricks bundle run -t integration <flow_name> --no-wait
          fi

      workingDirectory: $(Build.SourcesDirectory)
      target: pycontainer
      displayName: "Run integration test"

@andrewnester
Copy link
Contributor Author

@mfleuren do you have a libraries or environments section in your DABs config files where you reference any libraries? Could you share this section?

@mfleuren
Copy link

mfleuren commented Aug 23, 2024

@andrewnester we do have libraries defined for individual tasks, f.i.

- task_key: deploy_model_on_dsia
                  depends_on:
                      - task_key: daily_inference
                  notebook_task:
                    notebook_path: wise_bundle/wise/src/deployment/deploy_api
                    source: GIT
                  job_cluster_key: wise_etl_cluster
                  max_retries: 1
                  min_retry_interval_millis: 60000
                  libraries:
                      - pypi:
                            package: ${var.msal-package}
                      - pypi:
                            package: ${var.requests-package}

Where the packages are defined centrally, in this case:

  requests-package:
    description: PyPi package
    default: requests==2.25.1
  msal-package:
    description: PyPi package
    default: msal==1.28.0

All defined packages are publicly available on Pypi.

@andrewnester
Copy link
Contributor Author

Thanks for the details! Indeed, it's a bug on our side which is fixed in this PR #1717
It will be released in the next CLI release

github-merge-queue bot pushed a commit that referenced this pull request Aug 26, 2024
## Changes
Fixes issue introduced here #1699
where PyPi packages were treated as local library.

The reason is that `libraryPath` returns an empty string as a path for
PyPi packages and then `IsLibraryLocal` treated empty string as local
path.

Both of these functions are fixed in this PR.

## Tests
Added regression test
@mfleuren
Copy link

@andrewnester awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants